Ising models for networks of real neurons 
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Ising models with pairwise interactions are the least structured, or maximum-entropy, probability 
distributions that exactly reproduce measured pairwise correlations between spins. Here we use this 
equivalence to construct Ising models that describe the correlated spiking activity of populations 
of 40 neurons in the retina, and show that pairwise interactions account for observed higher-order 
correlations. By first finding a representative ensemble for observed networks we can create synthetic 
networks of 120 neurons, and find that with increasing size the networks operate closer to a critical 
point and start exhibiting collective behaviors reminiscent of spin glasses. 

PACS numbers: 87.18.Sn, 87.19.Dd, 89.70.+C 



Physicists have long explored analogies between the 
statistical mechanics of Ising models and the functional 
dynamics of neural networks [l], Q ■ Recently it has been 
suggested that this analogy can be turned into a precise 
mapping Q: In small windows of time, a single neuron i 
either does (tTi = +1) or does not (o-j = —1) generate an 
action potential or "spike" Q ; if we measure the mean 
probability of spiking for each cell ((ci)) and the correla- 
tions between pairs of cells (Cy = (criCTj) — ((Ti)((Tj)), then 
the maximum entropy model consistent with these data 
is exactly the Ising model 
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where the magnetic fields {hi} and the exchange cou- 
plings { Jij} have to be set to reproduce the measured val- 
ues of {(ui)} and {Cij}. We recall that maximum entropy 
models are the least structured models consistent with 
known expectation values d, Q ; thus the Ising model is 
the minimal model forced upon us by measurements of 
mean spike probabilities and pairwise correlations. 

The surprising result of Ref Q is that the Ising model 
provides a very accurate description of the combinatorial 
patterns of spiking and silence in retinal ganglion cells 
as they respond to natural movies, despite the fact that 
the model explicitly discards all higher order interactions 
among multiple cells. This detailed comparison of the- 
ory and experiment was done for groups of iV ~ 10 neu- 
rons, which are small enough that the full distribution 
P({(Ti}) can be sampled experimentally. Here we extend 
these results to iV = 40, and then argue that the ob- 
served network is typical of an ensemble out of which we 
can construct larger networks. Remarkably, these larger 
networks seem to be poised very close to a critical point, 
and exhibit other collective behaviors which should be- 
come visible in the next generation of experiments. 

To be concrete, we consider the salamander retina re- 
sponding to naturalistic movie clips, as in the experi- 



ments of Refs Under these conditions, pairs of 

cells within ~ 200 /xm of each other have correlations 
drawn from a homogeneous distribution; the correlations 
decline at larger distance Q. This correlated patch con- 
tains N ~ 200 neurons, of which we record from iV = 40 
[i| ; experiments typically run for ^ 1 hr [lo| . 

The central problem is to find the magnetic fields and 
exchange interactions that reproduce the observed pair- 
wise correlations. It is convenient to think of this problem 
more generally: We have a set of operators 0^({cri}) on 
the state of the system, and we consider a class of models 
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our problem is to find the coupling constants g that gen- 
erate the correct expectation values, which is equivalent 
to solving the equations dlnZ{g)/dg^ = (0^({cri})}oxpt- 
Up to ~ 20 cells we can solve exactly, but this ap- 
proach does not scale to = 40 and beyond. For larger 
systems, this "inverse Ising problem" or Boltzmann ma- 
chine learning, as it is known in computer science llj. 



is a hard computational problem rarely encountered in 
physics, where we usually compute properties of the sys- 
tem given a known model of the interactions. 

Given a set of coupling constants g, we can estimate 
the expectation values (0/j)g by Monte Carlo simulation. 
Increasing the coupling will increase the expectation 
value (Op,), so a plausible algorithm for learning g is to 
increase each in proportion to the deviation of (O^) 
(as estimated by Monte Carlo) from its target value (as 
estimated from experiment). This is not a true gradi- 
ent ascent, since changing has an impact on operators 
{Oy^fj), but such an iteration scheme docs have the cor- 
rect fixed points; heuristic improvements include a slow- 
ing of the learning rate with time and the addition of 
some 'inertia', so that we update g^^ according to 



A.g^(i + 1) = -77(t) (O^)g(t) - (Op)cxpt +aA(7^(t), 
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FIG. 1: (a) Precision of the Ising model learned via Eq Q: 
measured covariance elements are binned on the x-axis and 
plotted against the corresponding reconstructed covariances 
on y-axis; vertical error bars denote the deviation within the 
bin and horizontal error bars denote the bootstrap errors on 
covariance estimates, (b) Zoom-in for small Cij, with scale 
bar representing the distribution of covariances from shuffled 
data. Not shown are the reconstructions of the means, {as), 
which are accurate to better than 1%. (c) Distribution of 
coupling constants Jij. (d) Measured vs predicted connected 
three-point correlations for 40 neurons (red) and exact solu- 
tion for a 20 neuron subset (black), (e) Probability of ob- 
serving K simultaneous spikes, compared to the failure of the 
independent model (black line). Dashed lines show error es- 
timates. 



where r\[t) is the time-dependent learning rate and a 
measures the strength of the inertial term [12| . 

Figure [T] shows the success of the learning algorithm by 
comparing the measured pairwise correlations to those 
computed from the inferred Ising model for 40 neurons. 
To verify that the pairwise Hamiltonian captures essen- 
tial features of the data, we predict and then check statis- 
tics that are sensitive to higher order structure: the prob- 
ability P{K) of patterns with K simultaneous spikes, 
connected triplet correlations and the distribution of en- 
ergies (latter not shown). The model overestimates the 
significant 3-point correlations by about 7% and gener- 
ates small deviations in P{K); most notably it under- 



estimates the no-spike pattern, Pexpt(-f'' = 0) = 0.550 
vs. Pising(^ = 0) = 0.502. These deviations are small, 
however, and it seems fair to conclude that the pairwise 
Ising model captures the structure of the A'^ = 40 neuron 
system very well. Smaller groups of neurons for which ex- 
act pairwise models are cornputable also show excellent 
agreement with the data [1, • 

It is surprising that pairwise models work well both on 
= 40 neurons and on smaller subsets of these: not ob- 
serving will induce a triplet interaction among neurons 
{aa,<Tf3,aj} for any triplet in which there were pairwise 
couplings between a-^ and all triplet members. Moreover, 
comparison of the parameters in g^^") with their corre- 
sponding averages from different subnets g^^") leaves the 
exchange interactions almost unchanged, while magnetic 
fields change substantially. To explain both phenomena, 
we examine the flow of the couplings under decimation. 
Specifically, we include three-body interactions, isolate 
terms related to spin a^, sum over an, expand in Jin, Jijn 
up to O(cr^), and then identify renormalized couplings: 
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where Jin = Ji„ - J2j Aj = ^in Jjn(l - t^^) + wJijn 
and oj = tanh(/in — J^i^in + ^X^ij-^yn)- The terms 
7, (5 (X (1 — [jj^) originate from terms with 3 and 4 factors 
of (7, respectively. The key point is that neurons spike 
very infrequently (on average in ~ 2.4% of the bins) and 
so (cTi) « —1, in which case to is approximately the hy- 
perbolic tangent of the mean field at site n and is close 
to —1. If pairwise Ising is a good model at size N, and 
couplings are small enough to permit expansion, then at 
size (N — 1) the corrections to pairwise terms, as well as 
Jijk, are suppressed by 1 — w^. This could explain the 
dominance of pairwise interactions: it is not that higher 
order terms are intrinsically small, but the fact that spik- 
ing is rare means that they do not have much chance to 
contribute. Thus, the pairwise approximation is more 
like a Mayer cluster or virial expansion than like simple 
perturbation theory. 

We test these ideas by selecting 100 random subgroups 
of 10 neurons out of 20; for each, we compute the ex- 
act Ising model from the data, as well as applying Eqs 
(|4H6]) 10 times in succession to decimate the network 
from 20 cells down to the chosen 10. The resulting 
three-body interactions Jyk have a mean and standard 
deviation ten times smaller than the pairwise Jy . If 
we ignore these terms, the average Jensen-Shannon di- 
vergence between this probability distribution and 
the best pairwise model for the iV = 10 subgroups is 
Djs = 9.3 ± 5.4 X 10~^bits, which is smaller than the 
average divergence between either model and the exper- 
imental data and means that ^ 10^ samples would be 
required to distinguish reliably between the two models. 
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FIG. 2: C{T) for systems of difFerent sizes. Ising models were 
constructed for 400 subnetworks of size 5, 180 of size 10, 90 
of size 15 and 20, 1 full network of size 40 (all from data), 
and 3 synthetic networks of size 120; vertical error bars are 
standard deviations across these examples. The mean of the 
heat capacity curve and the 1 sigma envelope for Ising models 
of randomized networks are shown in blue dashed lines. 



Thus, sparsity of spikes keeps the complexity in check. 

Given a model with couplings g, we can explore the 
statistical mechanics of models with g — > g/T. In par- 
ticular, this exercise might reveal if the actual operating 
point (T = 1) is in any way privileged. Tracking the 
specific heat vs T also gives us a way of estimating the 
entropy at T = 1, which measures the capacity of the 
neurons to convey information about the visual world; 
we recall that S{T = 1) = /g^ C{T)/TdT, and the heat 
capacity can be estimated by Monte Carlo from the vari- 
ance of the energy, C{T) = {{SE)^)/T^. 

Figure [2] shows the dependence of heat capacity on 
temperature at various system sizes. We note that the 
peak of the heat capacity moves towards the operating 
point with increasing size. The behavior of the heat ca- 
pacity C(T) is diagnostic for the underlying density of 
states, and offers us the chance to ask if the networks we 
observe in the retina are typical of some statistical ensem- 
ble. One could generate such an ensemble by randomly 
choosing the matrix elements Jy from the distribution 
that characterizes the real system, but models generated 
in this way have wildly diff'erent values of {(Ji). An alter- 
native is to consider that these expectation values, as well 
as the pairwise correlations Cy , are drawn independently 
out of a distribution, and then we construct Ising model 
consistent with these randomly assigned expectation val- 
ues. Figure [5] shows C(T) for networks of 20 neurons 
constructed in this way |16j . and we see that, within er- 
ror bars, the behavior of these randomly chosen systems 
resembles that of real 20 neuron groups in the retina. 

Armed with the results at = 20, we generated 



several synthetic networks of 120 neurons by randomly 
choosing once more out of the distribution of (tTi) and Cij 
observed experimentally [17[ • The heat capacity C120 (T) 
now has a dramatic peak at T* — 1.07±0.02, very close to 
the operating point at T = 1. If we integrate to find the 
entropy, we find that the independent entropy of the indi- 
vidual spins, 5*0(120) = 17.8 ± 0.2 bits, has been reduced 
to 5(120) = 10.7 ± 0.2 bits. Even at A^ = 120 the en- 
tropy deficit or multi-information /(A) = 5o(A^) — 'S'(A) 
continues to grow in proportion to the number of pairs 
A^^), continuing the pattern found in smaller networks 
Looking in detail at the model, the distribution of 
Jij is approximately Gaussian J = —0.016 ± 0.004 and 
a J = 0.61 ± 0.04; 53% of triangles are frustrated (46% at 
A^ = 40) , indicating the possibility of many stable states, 
as in spin glasses [18[. We examine these next. 

At A^ = 40 we find 4 local energy minima {G2, ■ ■ • , 
in the observed sample that are stable against single spin 
flips, in addition to the silent state Qi (cri = —1 for all 
i). Using zero-temperature Monte Carlo, each configu- 
ration observed in the experimental data is assigned to 
its corresponding stable state. Although this assignment 
makes no reference to the visual stimulus, the collective 
states Ga are reproducible across multiple presentations 
of the same movie (Fig [3K) , even when the microscopic 
state {(Ti} varies substantially (Fig[5]3). 

At A^ — 120, we find a much richer structure [lit : 
the Gibbs state now is a superposition of thousands of 
Ga, with a nearly Zipf-like distribution (Fig[3j:). The 
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FIG. 3: (a) Probability that the 40 neuron system is found 
in a configuration within the basin of each nontrivial ground 
state Qa, as a function of time during the stimulus movie; 
P{Ga\t) = 0.4 means that the retina is in that basin on 40% 
of the 145 repetitions of the movie, (b) All unique patterns 
assigned to Q5 at t = 10.88 — 10.92 s. (c) Zipf plot of the rank 
ordered frequencies with which the lowest lying 5 • 10* stable 
states are found in the simulated 120 neuron system. 
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entropy of this distribution is 3.4 ±0.3 bits, about a third 
of the total entropy. Thus, a substantial fraction of the 
network's capacity to convey visual information would 
be carried by the collective state, that is by the identity 
of the basin of attraction, rather than by the detailed 
microscopic states. 

To summarize, the Ising model with pairwise inter- 
actions continues to provide an accurate description of 
neural activity in the retina up to iV = 40. Although 
correlations among pairs of cells are weak, the behav- 
ior of these large groups of cells is strongly collective, 
and this is even clearer in larger networks that were con- 
structed to be typical of the ensemble out of which the 
observed network has been drawn. In particular, these 
networks seem to be operating close to a critical point. 
Such tuning might serve to maximize the system's sus- 
ceptibility to sensory inputs, as suggested in other sys- 
tems [i^l ; by definition operating at a peak of the specific 
heat maximizes the dynamic range of log probabilities for 
the different microscopic states, allowing the system to 
represent sensory events that occur with a wide range 
of likelihoods '21'|. The observed correlations are not 
fixed by the anatomy of the retina or by the visual in- 
put alone, but reflect adaptation to the statistics of these 
inputs [221 ; it should be possible to test experimentally 
whether these adaptation processes preserve the tuning 
to a critical point as the input statistics are changed. 
Finally, the transition from = 40 to iV = 120 opens 
up a much richer structure to the configuration space, 
suggesting that the representation of the visual world by 
the relevant groups of 200 cells may be completely 
dominated by collective states that are invisible to ex- 
periments on smaller systems. 
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